Currently, data scientists have been called to the front lines as they analize data from the COVID-19 pandemic. It isn’t hyperbolic to say that data scientists have saved lives; but in this trying time they also entertain us. As we all cacoon ourselves in our homes, data scientists refine our Netflix recommendations, identify ISP outages in real time, and keep toilet paper traveling to stores where people need it verses stores with hoarders. In the spirit of this, as data scientists, we chose to perform the following analysis for entertainments sake.
Students across the country have left their college campuses to embrace new online learning communities. This transition has not been easy and school spirit is probably not at an all time high. One of the iconic representations of school spirit is the college fight song. The following analysis is a data exploration of college fight songs from the Power 5 schools (plus Notre Dame).
A dataset containing college fight songs was acquired from FiveThirtyEight.com. Variables such as the school, the author, year it was written, beats per minute, length and lyric cliches from the songs were presented. The original article by FiveThirtyEight.com allowed readers to select a school and view it on a graph comparing its length and speed with other colleges’ songs and listed the cliches in the lyrics. This was a great jumping off point for our analysis. To the original dataset we added four more variables and merged it with a dataset containg university demographic data.
The first variables that were added were the 2019 football wins and losses for the schools in our fight song dataset. This data was obtained from ncaa.com. The next variables we obtained were from niche.com. Niche is a site that provides university information to applying students. They site also creates rankings and letter grades for schools on a side variety of topics. We chose to utalize their party school rankings and athletic rankings. A school ranked number one is the best in that particular category.
Another valuable source of college information is the Integrated Postsecondary Education Data System (IPEDS). This data is provided by the National Center for Education Statistics. The IPEDS data can be explored via their website and customized datasets can be downloaded. The IPEDS dataset we utalized was one created and shared on Kaggle. The merging of this dataset with the fight song data proved to be a challenge.
Throughout the process of acquiring the data, merging it into one dataset, and finding new sources we kept a list of questions that we wanted to explore. Some of the most interesting were:
After our data wrangling we set off to find some answers.
The above interactive plot was inspired by FiveThirtyEight’s original analysis. The original plot graphed song length by speed. To add onto that we made the plot interactive so that the song titles and schools could be viewed. We also color coded the data points by the athletic division of the school.
From this graph we see that the longest song is the Aggie War Hymn for Texas A&M. Auburn has one of the shortest songs and it is also on the slow end. colorado and Oklahoma also have some of the shortest songs, but theirs are a bit faster. Most songs tend to be short and fast with a cluster of slower songs.
This data was used to create a new categorical variable in our dataset. We divided both the bpm and length in two even halfs creating four quadrants: short and fast, short and slow, long and fast, and long and slow.
Next we wanted to see where, when, and if a student wrote the song in a visual manner.